Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add extension best practices #338

Open
wants to merge 21 commits into
base: dev
Choose a base branch
from
Open

Conversation

bendichter
Copy link
Contributor

Motivation

Integrate best practices from this PR: NeurodataWithoutBorders/nwb-overview#72

@CodyCBakerPhD
Copy link
Contributor

ReadTheDocs does not like the tabs stuff

/home/docs/checkouts/readthedocs.org/user_builds/nwbinspector/checkouts/338/docs/best_practices/extensions.rst:26: ERROR: Unknown directive type "tabs".

from https://readthedocs.org/api/v2/build/19360519.txt

Comment on lines +112 to +114
* Extend ``TimeSeries`` for storing timeseries data. NWB provides main types of ``TimeSeries``
and you should identify the most specific type of ``TimeSeries`` relevant for your use case
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these TimeSeries/ElectricalSeries be :py:class: intersphinx references?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for below on TimeIntervals / DynamicTable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest intersphinx to the nwb-schema docs since this is for extensions

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Attributes should be used mainly to store small metadata (usually less than 64 Kbytes) that
is associated with a particular Group or Dataset. Typical uses of attributes are, e.g., to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reading this, it's hard to grasp 'how big' something is to be less than 64 KB

Since most metadata are strings, maybe it would be useful to mention a relative scale to that?

A standard Python string under 64 KB would correspond to ~1300 characters, from

import sys

sys.getsizeof("a")
> 50

64_000 / 50
> 1280

Google also says the average number of characters per word in English is 4.7, so that's ~272 words per string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly relevant to fields like description, or sometimes reference_frame

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of relating this to characters. Do you know how big characters are within HDF5? Is it the same as Python strings in memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'll have to investigate that a bit deeper, but most likely not the same as Python... I seem to remember us having some issues related to their byte string types here on the Inspector, but maybe that was for fields that are technically set as datasets, not attributes

Even in Python, if you apply a specific encoding it can change the size quite a bit

sys.getsizeof("a".encode("utf-8"))
> 34

which would put the max closer to ~400 words

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, from https://docs.h5py.org/en/stable/strings.html#storing-strings and https://docs.h5py.org/en/stable/special.html#h5py.string_dtype and manually inspecting some of these attributes with HDFView

It seems HDF5 stores with UTF-8 encoding, so about ~400 words on average from above

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~400 words

words or characters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

words or characters?

words using Google's estimate of average characters per word in English

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just shy of ~1900 characters if that's easier to communicate

* **Use the postfix ``Table`` when extending a ``DynamicTable`` type.** e.g.,
``neurodata_type_def: LaserSettingsTable``
* **Explicit**. E.g., avoid the use of ambiguous abbreviation in names.

Copy link
Contributor

@oruebel oruebel Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Limit flexibility: Consider data reuse and tool developers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
One of the aims of NWB is to make reusing data easier. This means that when proposing an extension you need to put yourself in the shoes of someone who will receive an NWB dataset and attempt to analyze it. Additionally, consider developers that will try to write tools that take NWB datasets as inputs. It’s worth assessing how much additional code different ways of approaching your extension will lead to.

and you should identify the most specific type of ``TimeSeries`` relevant for your use case
(e.g., extend ``ElectricalSeries`` to define a new kind of electrical recording).
* Extend ``DynamicTable`` to store tabular data.
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
* Extend ``TimeIntervals`` to store specific annotations of intervals in time.
Strive for backward compatible changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NWB is already incorporated in many tools - proposing a change that will make already released NWB datasets non-compliant will cause a lot of confusion and will lead to significant cost to update codes.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By using the :nwb_extension_git:`ndx-template` to create new extensions helps ensure
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
that extensions can be easily shared and reused and published via the :ndx-catalog:`NDX Catalog <>`.
Get the community involved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Try to reach out to colleagues working with the type of data you are trying to add support for. The more eyes you will get on your extension the better it will get.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants